Apparatus and method for performing montgomery type modular multiplication

ABSTRACT

Disclosed is a modular multiplication apparatus for high-speed encryption/decryption and electronic signature in a mobile communication environment including smart cards and mobile terminals. The present invention provides an apparatus for performing Montgomery type modular multiplication for calculating A·B·R −1 modN (where R=4 m+2 ) in m+2 (where m=n/2) clocks with the multiplier A and the multiplicand B, each having n bits as its inputs, wherein bits of the multiplier are sequentially shifted to generate a shifted bit string and the two least significant bits of the generated bit string are Booth-recorded. The present invention provides a high-speed modular multiplication apparatus with fewer gates and reduced power consumption.

PRIORITY

[0001] This application claims priority to an application entitled“APPARATUS AND METHOD FOR PERFORMING MONTGOMERY TYPE MODULARMULTIPLICATION”, filed in the Korean Intellectual Property Office onMar. 14, 2003 and assigned Serial No. 2003-16100, the contents of whichare hereby incorporated by reference.

BACKGROUND OF THE INVENTION

[0002] 1. Field of the Invention

[0003] The present invention relates generally to the field ofcryptography, and more particularly to an apparatus and method forperforming a Montgomery type modular multiplication for use in theencryption/decryption on information and digital signature technology.

[0004] 2. Description of the Related Art

[0005] In communication systems using smart cards and cyber money forelectronic commerce, mobile communication devices such as cellulartelephones, small-sized computers, etc., it is desirable to transportinformation (electronic text or data) safely by encrypting/decryptingthe information or conducting a digital signature process for theinformation. Here, the term “digital signature” refers to a techniquethat “signs” electronic texts with an electronic signature in anelectronic exchange of information, similar to that done conventionallyon paper. With the rapid increase of the number of Internet users andthe frequent transmission of personal information over the Internet,there is a vital need for safe transmission of information throughunsecured channels.

[0006] Various proposed algorithms such as RSA (Rivest-Shamir-Adleman),ElGamal, Schnorr, etc., have been employed for the encryption/decryptiontechniques and the digital signature technology using a public keysystem. The RSA algorithm-based ISO (International StandardOrganization)/IEC (International Electrotechnical Commission) 9796 hasbeen adapted as an international standard of these algorithms, DSA(Digital Signature Standard) as a modification of ElGamal has beenadapted in the U.S.A., GOSSTANDART (commonly abbreviated as “GOST”) hasbeen adapted in Russia, and KC-DSA has been adapted in Korea. However,various communication systems in current use have adapted many PKCSs(Public Key Cryptography Standards). The above-mentioned algorithmsrequire operation for modular exponentiation, m^(e)modN, whichincorporates repetitive performance of modular multiplication, A·BmodN.

[0007] Many algorithms which perform modular multiplication required togenerate and verify a digital signature based on a public key ciphersuch as the RSA have been proposed, for example, R. L. Rivest et al, “AMethod For Obtaining Digital Signatures And Public-Key Crytosystems,”Communications of the ACM, Vol. 21, pp. 120-126, 1978; P. L. Montgomery,“Modular Multiplication Without Trial Division,” Math. Of Comp., Vol.44, No. 170, pp. 519-521, 1985; S. R. Dusse and B. S. Kaliski Jr., “ACryptographic Library For The Motorola DSP5600, ” Proc. Eurocrypto'90,pp. 230-244, 199?; and Spronger-Verlag, A. Bosselaers, R. Govaerts andJ. Vandewalle, “Comparison Of Three Modular Reduction Functions,”Advances in Cryptology-CRYPTO'93, pp. 175-186, 1993. From the paper byD. R. Stinson, “Cryptography”, CRC Press, 1995, of these algorithms, theMontgomery algorithm has been found to be the most efficient in view ofcalculation efficiency in modular multiplication for modularexponentiation required for various algorithms, but it is not anefficient algorithm for simple modular multiplication. U.S. Pat. No.6,185,596 discloses an example of an apparatus implemented by theMontgomery algorithm.

[0008] As mentioned above, many algorithms and architectures have beenproposed for the public key encryption/decryption and electronicsignature. However, since modular multiplication apparatuses accordingto most of the proposed algorithms and architectures are designed forhigh-speed public key encryption/decryption, they have a disadvantage inthat a great number of gates are required and a large amount of power isconsumed.

SUMMARY OF THE INVENTION

[0009] Therefore, the present invention has been made in view of theabove problems, and it is an object of the present invention to providea modular multiplication apparatus with fewer gates for high-speedencryption/decryption and electronic signature in a mobile communicationenvironment including smart cards and mobile terminals.

[0010] It is another object of the present invention to provide amodular multiplication apparatus with reduced power consumption forhigh-speed encryption/decryption and electronic signature in a mobilecommunication environment including smart cards and mobile terminals.

[0011] It is still another object of the present invention to provide amodular multiplication apparatus, which enables high-speedencryption/decryption and electronic signature in a mobile communicationenvironment including smart cards and mobile terminals.

[0012] To achieve the above objects, the modular multiplicationapparatus for implementing an information encryption/decryptiontechnique in which a message (A) is encrypted/decrypted using a firstkey (B) and a second key (N), comprises a storage having separateregions which store the message, the first key, and the second key, eachof n bits in length; a recording logic which generates a first n+4 bitsignal using the message and the first key at each clock pulse; a firstcarry save adder which generates a 3 bit sequence consisting of onecarry value and two sum values using the first n+4 bit signal and twoparallel n+4 bit input signals; a quotient logic which generates a 3 bitdeterminer for determining a modular reduction multiple using the 3 bitsequence and one carry value; a selector which generates a second n+4bit signal using the second key and the 3 bit determiner; a second carrysave adder which outputs a pair of sum and carry values using the secondn+4 bit signal, and respective sum and carry terms output from the firstcarry adder; a first full adder which outputs a carry input value byperforming a full addition operation with the pair of sum and carryvalues and a carry value outputted from the quotient logic at a previousclock. The separate regions are shift registers for the respectivemessage, first key, and second key. The message is right-shifted by 2bit positions at every clock. The first n+4 bit signal is one of 0, B,2B, −B, and −2B. The second n+4-bit signal is one of 0, N, 2N, −N, and−2N.

[0013] The recording logic comprises a booth recording circuit forperforming booth recoding with two low order bits of message; amultiplexer for multiplexing the two low order bits and the first key soas to output one of 0, B, and 2B; and a one's complementary operator forselectively performing one's complementary operation on the n+1 bitsignal output from the multiplexer according to the two low order bitsso as to generate one of 0, B, 2B, −B, and −2B.

[0014] The first carry save adder comprises n+4 second full adders, eachperforming full addition operation with corresponding sum and carry bitsof the two parallel n+4 input signals and corresponding bit of the firstn+4 bit signal so as to produce the 3-bit sequences. A first one of thetwo parallel n+4 bit input signals is created by selecting high ordern+2 bits from a sum term of the second carry save adder and inserting 2bits as higher order bits of the selected n+2 bits and the two higherorder bits are zeros. A second one of the two parallel n+4 bit inputsignals is created by selecting higher n+3 bits from a carry term of thesecond carry save adder and inserting one bit as higher order bit of theselected n+3 bits and the one higher order bit is zero.

[0015] The quotient logic comprises a D flip-flop for temporally storingthe carry input value from the first full adder; a third full adder forperforming full addition operation on the carry input value and a sumvalue outputted from a least significant full adder of the first carrysave adder in consideration of a sign of the first key; an exclusive OR(XOR) logic gate for performing exclusive OR operation on the carryvalue outputted from the least significant full adder of the first carrysave adder, a sum value outputted from a secondly least significant fulladder of the first carry save adder, and the carry value outputted fromthe quotient logic at a previous clock; and a combinational circuit forcombining the outputs of the third full adder and exclusive OR logicgate and a second least significant bit (n1) of the second key so as tooutput the 3-bit determiner signal.

[0016] The second carry save adder comprises n+4 fourth full adders,each performing full addition operation with corresponding sum and carrybits from the first carry save adder except for the least significantsum bit and the most significant carry bit and a corresponding bit ofthe second n+4-bit signal so as to produce the pairs of sum and carrybits. The first full adder performs full addition with a sum valueoutput from a second least significant full adder of the second carrysave adder, a carry value outputted from a least significant full adderof the second carry save adder, and carry value (cin) of the quotientlogic at a previous clock so as to produce the carry input value.

[0017] The modular multiplication apparatus further comprises a carrypropagation adder for performing a carry propagation addition operationwith the sum and carry terms outputted from the second carry save adderafter m+2 clock, where m=n/2. The carry propagation adder adds modulussecond key to a result of the carry propagation addition operation if anoutput of the carry propagation adder is negative value.

[0018] In another aspect of the present invention, the modularmultiplication device for implementing a message encryption/decryptiontechnique in which the message (A) is encrypted/decrypted using a firstkey (B) and a second key (N), comprises: a storage having separateregions for storing the message, the first key, and the second key of nbits; a recording logic for generating a first n+3-bit signal using themessage and first key at each clock; a first carry save adder foroutputting a 3 bit sequence consisting of one carry value and two sumvalues by performing a first carry save addition operation with the -bitsignal and two parallel n+3-bit input signals; a quotient logic forgenerating a 2 bit determiner for determining a modular reductionmultiple by performing a quotient operation with the 3 bit sequence andone carry value; a selector for generating a second n+3 bit signal usingthe second key and the 2 bit determiner; a second carry save adder foroutputting a pair of sum and carry values by performing a second carrysave addition operation with the second n+3 bit signal, and respectivesum and carry terms output from the first carry addition operation; anAND logic gate for outputting a carry input value by performing an ANDoperation with the pair of sum and carry values. The separate regionsare shift registers for the respective message, first key, and secondkey. The message is shifted by 2 positions at every clock. The first n+3bit signal is one of 0, B, 2B, and 3B. The second n+3 bit signal is oneof 0, N, 2N, and 3N.

[0019] The recording logic is a multiplexer which multiplexes two lowerbits of message and n bits of first key so as to output the first n+3bit signal. The first carry save adder comprises n+3 first full adders,each performing full addition operation with corresponding sum and carrybits of the two parallel n+3 bit input signals and corresponding bit ofthe first n+3 bit signal so as to produce the 3 bit sequence. A firstone of the two parallel n+3 bit input signals is created by selectinghigh order n+1 bits from a sum term of the second carry save adder andinserting two bits as higher order bits of the selected n+1 bits and thetwo higher order bits are zeros. A second one of the two parallel n+3bit input signals is created by selecting higher order n+2 bits from acarry term of the second carry save adder and inserting 1 bit as higherorder bit of the selected n+2 bits and the one higher order bits iszero.

[0020] The quotient logic comprises a D flip-flop for temporally storingthe carry input value from the AND logic; a half adder for performinghalf addition operation with the carry input value and a sum valueoutputted from the a significant fuller adder of the first carry saveadder; an exclusive OR (XOR) logic gate for performing an exclusive ORoperation with a carry value outputted from the least significant fulladder of the first carry save adder, a sum value outputted from a secondleast significant full adder, and an output of the half adder; acombinational circuit for combining the outputs of the half adder andthe exclusive OR logic and a second least significant bit (n1) of thesecond key so as to output the 2-bit determiner signal.

[0021] The second carry save adder comprises n+3 second full adders,each performing full addition operation with corresponding sum and carrybits from the first carry saver adder except for a least significant sumbit and the most significant carry bit and a corresponding bit of thesecond n+3 bit signal so as to produce the pairs of sum and carry bits.The AND logic performs AND operation with a sum value outputted from asecond least significant second full adder of the second carry saveadder and a carry value outputted from a least significant second fulladder of the second carry save adder so as to produce the carry inputvalue. The modular multiplication apparatus further comprises a carrypropagation adder for performing carry propagation addition operationwith the sum and carry terms outputted from the second carry save adderafter m+2 clock.

[0022] In another aspect of the present invention, the modularmultiplication method for implementing a message encryption/decryptiontechnique in which a message (A) is encrypted/decrypted using a firstkey (B) and a second key (N), comprises storing the message, first key,and second key of n bits in respective storages; generating a first n+4bit signal using the message and first key at each clock; outputting a 3bit sequence consisting of one carry value and two sum values byperforming a first carry save addition operation with the first n+4 bitsignal and two parallel n+4 bit input signals; generating a 3 bitdeterminer for determining a modular reduction multiple by performing aquotient operation with the 3 bit sequence and one input carry value;generating a second n+4 bit signal using the second key and the 3 bitdeterminer; outputting a pair of sum and carry values by performing asecond carry save addition operation with the second n+4 bit signal, andrespective sum and carry terms output from the first carry save additionoperation; outputting a carry input value by performing a full additionoperation with the pair of sum and carry values and carry valueoutputted from the quotient logic at a previous clock. The message isright-shifted by 2 bits at every clock.

[0023] The procedure of generating the first n+4 bit signal includesperforming booth recording with two low order bits of the message; andgenerating one of 0, B, 2B, −B, and −2B according to the two low orderbits. A first one of the two parallel n+4-bit input signals is createdby selecting high order n+2 bits from a sum term output by the secondcarry save addition operation and inserting 2 bits as higher order bitsof the selected n+2 bits and the two higher order bits are zeros. Asecond one of the two parallel n+4 input signals is created by selectinghigher n+3 bits from a carry term of the second carry save additionoperation and inserting one bit as a higher order bit of the selectedn+3 bits and the one higher order bit is zero. The 3 bit sequenceincludes two sum values and one carry value. The two sum values are aleast significant bit and a second least significant bit of a sum termoutputted from the first carry save addition operation and the one inputcarry value is a least significant bit of a carry term outputted fromthe first carry save addition operation. The one input carry value isthe carry input value generated by the full addition operation. Thesecond n+4 bit signal is selected from among 0, N, 2N, −N, and −2Naccording to two low order bits of the 3 bit determiner. The pair of sumand carry values are a second least significant bit of a sum term and aleast significant bit of the carry term output from the second carrysave addition operation. Most significant bits of the sum and carryterms outputted from the first carry save addition operation areignored. The modular multiplication method further comprises performinga carry propagation addition operation with the sum and carry termsafter m+2 clock, where m=n/2. The modular multiplication method furthercomprises adding modulus second key if an output of the carrypropagation addition operation is a negative value.

[0024] In still another aspect of the present invention, a modularmultiplication method for implementing a message encryption/decryptiontechnique in which message (A) is encrypted/decrypted using a first key(B) and a second key (N), comprises storing the message, first key, andsecond key of n bits in respective storages; generating a first n+3 bitsignal using the message and first key at each clock; outputting a 3 bitsequence consisting of one carry value and two sum values by performinga first carry save addition operation with the first n+3 bit signal andtwo parallel n+3 bit input signals; generating a 2 bit determiner fordetermining a modular reduction multiple by performing a quotientoperation with the 3 bit sequence and one input carry value; generatinga second n+3 bit signal using the second key and the 2 bit determiner;outputting a pair of sum and carry values by performing a second carrysave addition operation with the second n+3 bit signal, and respectivesum and carry terms outputted from the first carry addition operation;outputting a carry input value by performing an AND operation with thepair of sum and carry values. The message is right-shifted by 2 bits atevery clock. The first n+3 bit signal is produced by multiplexing twolow order bits of the message and the first key. The first n+3 bitsignal is one of 0, B, 2B, and 3B. A first one of the two parallel n+3bit input signals is created by selecting high order n+1 bits from a sumterm of the second carry save addition operation and inserting two bitsas higher order bits of the selected n+1 bits and the two higher orderbits are zeros. A second one of the two parallel n+3 bit input signalsis created by selecting higher order n+2 bits from a carry term of thesecond carry save addition operation and inserting 1 bit as higher orderbit of the selected n+2 bits and the one higher order bit is zero.

[0025] The 3 bit sequence includes two sum values and one carry value.The two sum values are a least significant bit and a secondly leastsignificant bit of a sum term output from the first carry save additionoperation and the one carry value is a least significant bit of a carryterm output from the first carry save addition operation. The one inputcarry value is the carry input value generated by the AND operation. Thesecond n+3 bit signal is selected from among 0, N, 2N, and 3N accordingto 2 bit determiner. The pair of sum and carry values are a second leastsignificant bit of a sum term and a least significant bit of the carryterm outputted from the second carry save addition operation. The mostsignificant bits of the sum and carry terms outputted from the firstcarry save addition operation are ignored.

[0026] The modular multiplication method further comprises performing acarry propagation addition operation with sum and carry terms outputtedfrom the second carry save addition operation after m+2 clock.

BRIEF DESCRIPTION OF THE DRAWINGS

[0027] The above and other objects, features and other advantages of thepresent invention will be more clearly understood from the followingdetailed description taken in conjunction with the accompanyingdrawings, in which:

[0028]FIG. 1 is a block diagram showing a configuration of a modularmultiplication apparatus in accordance with a first embodiment of thepresent invention;

[0029]FIG. 2 is a block diagram showing a detailed configuration of arecording circuit shown in FIG. 1;

[0030]FIG. 3 is a block diagram showing a detailed configuration of thefirst carry save adder shown in FIG. 1;

[0031]FIG. 4 is a block diagram showing a detailed configuration of thequotient logic shown in FIG. 1;

[0032]FIG. 5 is a block diagram showing a detailed configuration of thesecond carry save adder shown in FIG. 1;

[0033]FIG. 6 is a block diagram showing a detailed configuration of thefull adder shown in FIG. 1;

[0034]FIG. 7 is a block diagram showing a configuration of a modularmultiplication apparatus in accordance with a second embodiment of thepresent invention;

[0035]FIG. 8 is a block diagram showing a detailed configuration of arecording circuit shown in FIG. 7;

[0036]FIG. 9 is a block diagram showing a detailed configuration of thefirst carry save adder shown in FIG. 7;

[0037]FIG. 10 is a block diagram showing a detailed configuration of thequotient logic shown in FIG. 7;

[0038]FIG. 11 is a block diagram showing a detailed configuration of thesecond carry save adder shown in FIG. 7;

[0039]FIG. 12 is a diagram showing a detailed configuration of the fulladder shown in FIG. 7; and

[0040]FIG. 13 is a block diagram showing an example of application ofthe modular multiplication apparatuses in accordance with theembodiments of the present invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

[0041] Preferred embodiments of the present invention will now bedescribed in detail with reference to the annexed drawings. In thedrawings, the same or similar elements are denoted by the same referencenumerals even though they are depicted in different drawings. In thefollowing description, a detailed description of known functions andconfigurations incorporated herein will be omitted when it may obscurethe subject matter of the present invention.

[0042] A. Outline of the Invention

[0043] In the following description, the present invention discloses anapparatus and method for performing a modular multiplication, A·BmodN,by using a Montgomery algorithm, where

A=a _(n−1)·2^(n−1) + . . . +a ₁·2+a ₀,

B=b _(n−1)·2^(n−1) + . . . +b ₁·2+b ₀, and

N=n _(n−1)·2^(n−1) + . . . +n ₁·2+n ₀.

[0044] Here, A is a multiplier, B is a multiplicand, and N is a modulonumber, a bit size of each of which can be a large number, for example,512 or 1024.

[0045] The modular multiplication, A·BmodN, is implemented by twoembodiments, which will be described. Each embodiment suggests a modularmultiplication apparatus and method for calculating A·B·R−1modN in m+2clocks with A, B and N (where R=4^(m+2), m=n/2, −N≦A, and B<N), eachbeing n bits in length, being received as inputs. A·BmodN can becalculated by using a multiplication result by the suggested modularmultiplication apparatus. The modular exponentiation, m^(e)modN, whichis required to perform RSA operation, can be derived from the calculatedA·BmodN. FIGS. 1 to 6 of the drawings are block diagrams showing theconfiguration of the elements of the modular multiplication apparatus inaccordance with a first embodiment of the present invention, and FIGS. 7to 13 are block diagrams showing the configuration of the elements ofthe modular multiplication apparatus in accordance with a secondembodiment of the present invention. FIG. 14 is a block diagram of an ICcard to which the modular multiplication apparatuses in accordance withthe embodiments of the present invention are applicable.

[0046] Embodiments of the present invention provide modularmultiplication apparatuses which bits of the multiplier are sequentiallyshifted to generate a shifted bit string and two lower bits of thegenerated bit string are Booth-recorded. In contrast with conventionalmodular multiplication apparatuses wherein only a single lower bitgenerated by sequentially shifting bits of the multiplier is recorded,the present invention allows the multiplication to be performed athigher speeds by processing bits in a manner where two lower bits arerecorded. The modular multiplication apparatuses in accordance with theembodiments of the present invention include modified recording logicsand other elements configured in compliance with the modified recordinglogics for performing the modular multiplication operation according tothe Montgomery algorithm.

[0047] B. First Embodiment

[0048] B-1. Configuration of the Invention

[0049]FIG. 1 is a block diagram showing a configuration of a modularmultiplication apparatus in accordance with the first embodiment of thepresent invention.

[0050] Referring to FIG. 1, the modular multiplication apparatusincludes recording logic 110, a first carry save adder (hereinafter,abbreviated as “CSA1”) 120, a quotient logic 130, selector 140, a secondCSA (“CSA2”) 150, and a full adder (FA) 160. The modular multiplicationapparatus is a hardware device for calculating A·B·R⁻¹modN in m+2 clockswith A, B and N (where R=4^(m+2), m=n/2, −N≦A, and B<N), each having ninput bits according to a Montgomery algorithm. The modularmultiplication apparatus calculates A·B·2^(−(n+4))modN.

[0051] Each of the CSAs 120 and 150 is composed of (n+4) full adders inparallel, each of which has a 3 bit input and outputs a carry bit and asum bit. The recording logic 110 performs a modified Booth recordingoperation based on the multiplier A and outputs one of the values 0, ±B,and ±2B as a signed extension bit of the (n+4) bits. The quotient logic130 has as its inputs a least significant bit (LSB) carry value C_(1,0)and two sum LSB bits S_(1,1) and S_(1,0) from the CSA1 120, a carry-in,and a sign bit of B, and outputs q₂q₁q₀ of 3 bits, which is a value fordetermining a multiple of the modular reduction. The selector 140, whichcan be implemented by multiplexers (MUXs), selects and outputs one of 0,±N, and ±2N based on a determined value of q. The full adder 160performs full add operation, with two bits S_(2,1) and C_(2,0) outputfrom the CSA2 150 and a carry value cin as its inputs, and provides aresult value of the full add to the quotient logic 130 as a carry-insignal.

[0052] Although not shown in detail in FIG. 1, it should be noted thatthe modular multiplication apparatus includes temporary storingregisters C and R for storing carry values and sum values, which are theoutputs of the CSA1 120 and CSA2 150, respectively, for each clock, anda carry propagation adder for adding values stored in the temporarystoring registers C and R and outputting a resultant value as a resultof modular multiplication.

[0053]FIG. 2 is a block diagram showing a detailed configuration of therecording logic 110 shown in FIG. 1.

[0054] Referring to FIG. 2, the recording logic 110 Booth-records twolesser bits of a bit string generated by sequentially shifting bits ofthe multiplier A, multiplexes a result of the Booth recording with themultiplicand B, and outputs-signed binary numbers of (n+4) bits. Forthis purpose, a shift register 102 for sequentially shifting bits of themultiplier to generate a shifted bit string and a register 104 forstoring the multiplicand are provided at the front stage of therecording logic 110. The recording logic 110 also includes a Boothrecording circuit 112, a multiplexer (MUX) 114, and a one's complementer116. The Booth recording circuit 112 Booth-records two lesser bitsa_(i+1) and a_(i) of the generated bit string. The multiplexer 114multiplexes the result z_(i+1) of the Booth recording with themultiplicand, and outputs 0, B and 2B as a result of multiplexing. Theone's complementer 116 performs a one's complement operation on theoutput of the multiplexer 114 according to the two lesser bits of thegenerated bit string, and outputs signed binary numbers of the (n+4)bits. The recording logic 110, which is a circuit for implementing amodified Booth recording based on the multiplier A, outputs a signedextension bit of (n+4) bits, which is one of the values 0, ±B, and ±2B.

[0055]FIG. 3 is a block diagram showing a detailed configuration of theCSA1 120 shown in FIG. 1.

[0056] Referring to FIG. 3, the CSA1 120 having (n+4) full adders 121 to125 has as its inputs first signals S_(2,2) to S_(2,n+3) of (n+2) bits,second signals C_(2,1) to C_(2,n+3) of (n+3) bits, and third signals B₀to B_(n+3) being the binary numbers of (n+4) bits from the recordinglogic 110, and full-adds the inputs by means of the (n+4) full adders121 to 125 to output carry values C_(1,0) to C_(1,n+3) and sum valuesS_(1,0) to S_(1,n+3) of (n+4) bits. Here, an (n+2)th higher bitS_(2,n+3) of the first signals is input to the three higher full adders123 to 125, and an (n+3)th higher bit C_(2,n+3) of the second signals isinput to two the higher full adders 124 and 125.

[0057]FIG. 4 is a block diagram showing a detailed configuration of thequotient logic 130 shown in FIG. 1.

[0058] Referring to FIG. 4, the quotient logic 130 has as its inputs sumvalues S_(1,0) and S_(1,1) output from the two lower full adders and acarry value C_(1,0) output from lowest full adder, which are selectedfrom the carry values and sum values of (n+4) bits from the CSA1 120,and outputs a determination value q₂q₁q₀ of 3 bits to determine amultiple of modular reduction. The quotient logic 130 consists of a Dflip flop 132, a full adder 134, an exclusive OR (XOR) logic gate 136,and a combinational circuit 138. The D flip flop 132 temporarily storesa carry input value, Carry-in, from the FA 160. The full adder 134full-adds the carry input value Carry-in stored in the D flip flop 132and the sum value S_(1,0) output from the least significant bit fulladder 121 of the CSA1 120. The exclusive OR logic 136 performs anexclusive Or operation between the carry value C_(1,0) output from theleast significant bit full adder 121 of the CSA1 120 and the sum valueS_(1,1) output from a second full adder 122. Each of the full adder 134and the exclusive OR logic 136 is provided with a preset carry value cinfor correction, and the full adder 134 is also provided with a sign bitB sign of the multiplicand. The combinational circuit 138 combines theoutput S₀ from the full adder 134, the output S₁ from the exclusive ORlogic 136, and a preset input bit n1, and outputs the determinationvalue q₂q₁q₀ of 3 bits.

[0059]FIG. 5 is a block diagram showing a detailed configuration of theCSA2 150 shown in FIG. 1.

[0060] Referring to FIG. 5, the CSA2 150 includes (n+4) full adders 151to 156. The CSA2 150 includes modulo numbers N (N₀−N_(n+3)) of (n+4)bits selected from the selector 140 as a first input signal, andremaining carry values C_(1,0) to C_(1,n+3) of (n+3) bits, except a mostsignificant bit carry value of the carry values of (n+4) bits, from theCSA1 120 as a second input signal, and remaining sum values S_(1,1) toS_(1,n+3) of (n+3) bits, except a least significant bit carry value ofthe sum values of (n+4) bits, from the CSA1 120 as a third input signalto output carry values C_(2,0) to C_(2,n+3) of (n+4) bits and sum valuesS_(2,0) to S_(2,n+3) of (n+4) bits by means of the (n+4) full adders 151to 156. The (n+4) bits of the first input signal are sequentially input,starting from a least significant bit full adder 151, to respective fulladders 151 to 156, the (n+3) bits of the second input signal aresequentially input, starting from a second lower full adder 152, torespective full adders 152 to 156, and the (n+3) bits of the third inputsignal are sequentially input, starting from the second lower full adder152, to respective full adders 152 to 156. The least significant bitfull adder 151 of the full adders 151 to 156 is input with the output S₀from the full adder 134 of the quotient logic 130, q_(i,2), and a leastsignificant bit N₀ of the modulo numbers N.

[0061]FIG. 6 is a block diagram showing a detailed configuration of thefull adder 160 shown in FIG. 1.

[0062] Referring to FIG. 6, the full adder 160 full-adds a carry valueC_(2,0) output from the least significant bit full adder 151 of the CSA2150 and a sum value S_(2,0) output from the second lowest full adder 152to output a carry input value Carry-in. The full adder 160 is alsoprovided with a carry value cin for correction preset for full addoperation and outputs the carry input value Carry-in as a result of thefull add operation. The carry input value Carry-in is provided to thequotient logic 130.

[0063] B-2. Principle of the Invention

[0064] The present invention provides a device for calculatingA·B·R⁻¹modN in m+2 clocks with A, B and N (where R=4^(m+2), m=n/2, −N≦A,and B<N), each having n bits as its inputs. Three principles that areapplicable to the implementation of the present invention will bedescribed. The three principles include a first principle ofrepresentation of the multiplier A and the multiplicand B for modularmultiplication, a second principle of recording of the multiplier A formodular multiplication, and a third principle of the Montgomeryalgorithm using the principle of recording of the present invention.

[0065] B-2.a. Number Representation

[0066] In the present invention, the multiplier A and the multiplicand Bare represented by signed binary numbers for the modular multiplication.A and B, each having n bits, are respectively transformed to (n+4) bitsfor signed operation. During this transformation, any negative valuesare transformed to their one's complement.

[0067] B-2.b. Booth's Recording

[0068] The present invention employs a modified Booth recording system,which is a modification of the Booth recording system well known tothose skilled in the art to which the invention pertains. The presentinvention increased the speed of the modular multiplication. Themultiplier A is recorded as 2 bit z_(i) (where 0≦i≦m+1) by means of themodified Booth recording system. Here, it is assumed thata_(n+4)=a_(n+3), a⁻¹=0. The following Table 1 shows a rule of themodified booth recording according to the present invention. TABLE 1a_(i+1) a_(i) a_(i−1) z_(i+1) 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 2 1 0 0 −2 10 1 −1 1 1 0 −1 1 1 1 0

[0069] B-2.c. Radix-4 Montgomery Algorithm Using Booth's Recording

[0070] The algorithm illustrated in the following Equation 1 shows thatthe present invention employs the modified Booth recording system forradix-4 Montgomery modular multiplication. An original Montgomeryalgorithm compares a result value with a modulus N, and performs asubtraction operation if the result value is greater than the modulus N.However, the following algorithm of the present invention does not showsuch a comparison and subtraction operation of the original Montgomeryalgorithm.

[0071] Equation 1

[0072] Input: N, −N≦A,B<N

[0073] Output: S=A·B·4^(−m−2)modN, −N≦S<N

S=0  (1)

for i=0 to (n+1)/2  (2)

S=S+A _(i) ×B  (3)

q _(i(2,1,0)) =f(s ₁ ,s ₀ , n ₁ ,n ₀)  (4)

S=S+q _(i) ×N  (5)

S=S□2²  (6)

endfor  (7)

[0074] In the algorithm of Equation 1, A_(i) in procedure (3) refers totwo Booth-recorded bits and has a value of −2<A_(i)<2. Procedure (4)refers to a function that causes two least significant bits of theresult values in procedure (5) to be ‘0’. Result values in procedure (4)depend on input bits s₁, s₀, n₁, and n₀ and are determined as shown inthe following Table 2. q_(i2), the most significant bit (MSB) of a valueq_(i) used for modular reduction, is a sign bit. q_(i) is one ofelements {0, ±1, 2} and is calculated according to the followingEquation 2.

[0075] Equation 2

q_(i)={overscore (s₀)}s₁

q ₂ =s ₀ {overscore (s₁n₁)}+s ₀ s ₁ n ₁

q₀=s₀

[0076] TABLE 2 S₀ s₁ n₁ q₂ Q₁q₀ 0 0 0 0 00 0 0 1 0 00 0 1 0 0 10 0 1 1 010 1 0 0 1 01 1 0 1 0 01 1 1 0 0 01 1 1 1 1 01

[0077] B-3. Operation of the Invention

[0078] The apparatus of the present invention as shown in FIG. 1calculates A·B·R⁻¹modN in m+2 clocks with A, B and N (where R=4^(m+2),m=n/2, −N≦A, and B<N), each having n bits as its inputs.

[0079] A procedure for calculating A·B·R⁻¹modN (where, R=4^(m+2)) by theapparatus shown in FIG. 1 will now be described. In the followingdescription, step a) is an initialization step, steps b) to h) are stepsto be performed every clock, and step i) is a step to be performed aftersteps b) to h) are performed during (m+2) clocks.

[0080] a) A, B, N, each having n bits, input for modular multiplication,are stored in respective registers (or memories). Although the apparatusof the present invention is shown to store the inputs A and B inrespective registers 102 and 104 without showing a separate register inwhich N is stored, it is apparent to those skilled in the art that sucha separate register is used in the apparatus of the present invention.Here, the register 102 in which A is stored is a shift register in whichA is shifted to the right side by two bits for each clock. Forconvenience's sake, the register in which A is stored is indicated asregister A and the register in which B is stored is indicated asregister B. With respect to the memory, A and B are read out one word ata time. Temporary registers (or memories) C and S (both not shown indetail), in which a result of the calculation by the CSA2 150 shown inFIG. 1 is temporarily stored, are initialized as ‘0’.

[0081] b) When all data is input into each of the registers 102 and 104,the Booth recording circuit 112 of the recording logic 110 performs aBooth recording function based on the two LSB bits in the register 102.The MUX 114 of the recording logic 110 has as its input a value of Bstored in the register 104 and generates one of the values 0, ±B, ±2B,which is provided as one of three inputs to the CSA1 120, and is basedon the two LSB bits in the register 102. At this time, the one'scomplementer 116 of the recording logic 110 changes one of the values of0, ±B, ±2B into it's one's complement based on the two LSB bits in theregister 102, and represents the one's complement as an n+4 bit number,which is provided as one to three inputs of the CSA1 120.

[0082] c) The CSA1 120 performs an add operation for three input signedbinary numbers of n+4 bits. The CSA1 120 is composed of n+4 full adders121 to 125. Carries generated in full adders at a previous stage areprovided to the full adder at the next stage, while carries generated inthe MSB full adder 125 are ignored.

[0083] d) The quotient logic 130 has as its inputs output valuesS_(1,1), C_(1,0), and S_(1,0) from the CSA1 120, a Carry-in signalprovided from the full adder 160, a sign bit B sign of the multiplicandB, and calculates and outputs S₁ and S₀ by means of the full adder 134and the exclusive OR logic 136. The carry signal cin for correction isinput to the full adder 134 and the exclusive OR logic 136. The carrysignal cin is a signal for correcting a difference between the existingBooth recording system using two's complement and the modified Boothrecording system of the present invention using one's complement.

[0084] e) The combinational circuit 138 of the quotient logic 130 has asits input S₁ and S₀ calculated in step d) and determines a value q of 3bits by means of a truth table of Table 2. Although a detailedconfiguration of a circuit to determine the value of q by means of thetruth table of Table 2 is not shown, it is apparent to those skilled inthe art that a circuit for determining the value of q can be implementedby a general logic gate circuit.

[0085] f) The CSA2 150 has as its inputs carry values and sum valuesobtained as outputs of the CSA1 120 in step c), and a signed binarynumber of n+4 bits of one selected from 0, ±N, and ±2N determined by twoLSB bits of values of q obtained in step e) to perform an n+4 bit signedoperation. The CSA2 150 is composed of n+4 full adders 151 to 156. TheLSB full adder 151 of the full adders 151 to 156 has as its carry inputan MSB value q_(1,2) of the value of q calculated in step e).

[0086] g) The full adder 160 has as its inputs S_(2,1) and C_(2,0) bitsof output values of the CSA2 150 and bits of the carry signal cin forcorrection to output Carry-in bits through full adding of the inputs.This full adding operation is for correcting a difference between theexisting Booth recording system using two's complement and the modifiedBooth recording system of the present invention using one's complement.

[0087] h) (n+2) sum values and (n+3) carry values from the MSBs of theoutputs of the CSA2 150 are fedback to the CSA1 120 as its input. Atthis time, S_(2,n+3) being the MSB of a sum value which is an outputfrom the MSB full adder 156 of the CSA2 150 is copied and two bits areadded thereto, and C_(2,n+3) being the MSB of a carry value which is anoutput from the MSB full adder 156 of the CSA2 150, are copied and onebit is added thereto. Results of such a copy and an addition forS_(2,n+3) and C_(2,n+3) are input to the CSA1 120. The sum valueS_(2,n+3) output from the full adder 156 of the CSA2 150 is provided tothree full adders 123 to 125 of the CSA1 120, and the carry valueC_(2,n+3) is provided to two full adders 124 and 125 of the CSA1 120.

[0088] i) The following operation is performed after steps b) to h) areperformed during (m+2) clocks. A carry propagation adder (CPA) (notshown) performs an addition operation for the carry value and the sumvalue, which are outputs of the CSA2 150. If a result value of theaddition is a negative number, a modulus N is added thereto, but if theresult value of the addition is a positive number, the modulus N is notadded thereto.

[0089] For example, if each of A, B and N has 12 bits as shown in thefollowing Equation 3, a Montgomery modular operation result according tothe above-described procedure is as shown in the following Table 3 andTable 4.

[0090] Equation 3

N=0000.1010.0101.1001 (0xA59) B=0000.0101.1100.0011 (0x5C3)

N′=1111.0101.1010.0110 B′=1111.1010.0011.1100

2N=0001.0100.1011.0010 2B′=1111.0100.0111.1001

A=0000.1001.0011.1110 (0x93E)

[0091] TABLE 3 CSA1 out S I A_(i) C B-sign Carry-in S₁S₀ C I 00000.0000.0000.0000 0 0 00 0 0.0000.0000.0000.000 0 −21111.0100.0111.1001 1 0 10 1 0.0000.0000.0000.000 1 01111.0010.0010.1010 0 1 11 0 0.0001.0000.0010.100 2 01111.0011.0000.0000 0 1 01 0 0.0001.0000.0010.100 3 11111.1000.1111.0000 0 1 11 0 0.0000.1011.0000.011 4 11111.1110.1000.0000 0 1 11 0 0.0000.1010.1101.011 5 −20000.1110.1001.0010 1 1 10 1 1.1110.1010.1101.001 6 11111.1110.1011.0110 0 1 01 0 0.0000.1010.1001.001 7 01111.1111.0011.1011 0 1 00 1 0.0000.0000.0000.000

[0092] TABLE 4 CSA2 out S I A_(i) S₁S₀ C q₂q₁ C Carry-in I 0 00 0 0000000.0000.0000.0000 0 0.0000.0000.0000.000 0 −2 10 1 010(11).1110.0000.1100.1010 1 (0)0.0010.1000.0110.000 1 0 11 0 001(11).1110.1000.0101.0010 1 (0)0.0010.0100.0101.001 2 0 01 0 101(00).0001.0110.1000.1110 1 (1)1.1110.0010.0100.001 3 1 11 0 001(11).1111.1001.1010.1110 1 (0)0.0001.0100.1010.001 4 1 11 0 001(11).1111.1110.0000.1110 1 (0)0.0001.0101.1010.001 5 −2 10 1 010(11).1111.0000.1111.0010 1 (0)0.0001.1101.0010.010 6 1 01 0 101(00).0000.0001.1000.0010 1 (1)1.1111.1101.0110.111 7 0 00 1 0001111.1111.1011.1010 1 0.0000.0000.0000.000

[0093] A procedure for calculating the modular multiplication A·BmodNusing the result values of the operation by the apparatus of the presentinvention as described above will now be described. It should be notedthat a hardware configuration for performing the procedure is apparentto those skilled in the art, and hence, detailed explanation thereof isomitted. The following calculations are performed:

[0094] 1) Calculate P=2^(2(n+4))modN;

[0095] 2) Calculate C=A·B·2^(−(n+4))modN; and

[0096] 3) Calculate P·C·2^(−(n+4))modN=A·BmodN.

[0097] A procedure for calculating the modular exponentiation,m^(e)modN, required to perform the RSA operation using the result valuesof the operation of the apparatus of the present invention as describedabove will now be described. The following operations are performed:

[0098] 1) Store an exponent e in a register (or a memory);

[0099] 2) Store a modulus N in the temporary register C;

[0100] 3) Initialize the temporary registers C and S to ‘0’;

[0101] 4) Perform Montgomery modular multiplication,m′=f_(m)(m,P,N)=m·P·R⁻¹modN, where P in the modular exponentiation is apre-calculated value defined in the aforementioned procedure, andR=2^(n+4);

[0102] 5) Load m′ into the register B;

[0103] 6) Perform modular square operation using a value loaded into theregister B, here, where the multiplier A required for the Montgomerymodular multiplication is loaded from the register B and its value isobtained by using the modified Booth recording circuit;

[0104] 7) Shift the exponent e to the left;

[0105] 8) Ignore MSB 1 of the exponent e and perform subsequent steps 9)and 10) after the next bits;

[0106] 9) Perform steps 4) and 5) for the modular square operationregardless of a bit (0 or 1) of the exponent e, where, the multiplierand the multiplicand, which are required for the square operation, arestored in the register A and the register B, respectively;

[0107] 10) If the current bit of the exponent e is 1, perform step 4)and 5) for the modular multiplication after performing step 9), where,the multiplicand is the content of the register B and the multiplier isthe base m′ in the exponentiation; and

[0108] 11) Perform the modular multiplication once more using step 4)after performing steps 8) to 10) for all bits of the exponent e, where,the multiplicand is the content of the register B and the multiplier is1.

[0109] If a result value of the performance of the CPA for valuesremaining in the registers C and S after performing the above steps 1)to 11) is a negative number, the modulus N is added thereto. Otherwise,if the result value is a positive number, it becomes a final value ofthe exponentiation, m^(e)modN, with no addition of the modulus N.

[0110] B-4. Effect of the Invention

[0111] As apparent from the above description, the present inventionprovides a circuit for calculating A·B·2^(−(n+4))modN, making thegeneral modular multiplication A·BmodN possible by means of the circuit.A·BmodN calculated according to the present invention is applicable tohardware apparatuses employable for devices in generating and verifyingdigital signatures. In addition, the present invention is applicable tohardware apparatuses for generating electronic signatures,authentication, and encryption/decryption based on IC card. In addition,the present invention can provide devices for encrypting and decryptingdata or information by means of the electronic signature apparatus forperforming the modular multiplication. Furthermore, the presentinvention can be used to implement existing public key cryptographysystems such as NIST-DSS, RSA, ElGamal, and Schnorr electronicsignatures, based on the electronic signature apparatus.

[0112] C. Second Embodiment

[0113] C-1. Configuration of the Invention

[0114]FIG. 7 is a block diagram showing a configuration of a modularmultiplication apparatus in accordance with the second embodiment of thepresent invention.

[0115] Referring to FIG. 7, the modular multiplication apparatusincludes recording logic 210, a first carry save adder (hereinafter,abbreviated as “CSA1”) 220, quotient logic 230, a selector 240, a secondcarry same adder (CSA2) 250, and an AND logic gate 260. The modularmultiplication apparatus is a hardware device for calculatingA·B·R⁻¹modN in m+2) clocks with A, B and N (where R=4^(m+2), m=n/2,−N≦A, and B<N), each having n bits as its inputs, according to amodified Montgomery algorithm. Namely, the modular multiplicationapparatus has a configuration for calculating A·B·2^(−(n+4))modN

[0116] Each of the CSAs 220 and 250 is composed of (n+4) full adders inparallel, each of which has a 3 bit input, and outputs a carry bit and asum bit. The recording logic 210 performs modified Booth recordingoperation based on the multiplier A, and selects and outputs one of thevalues of 0, B, 2B, and 3B of (n+3) bits. The quotient logic 230 has asits inputs a least significant bit (LSB) carry value C_(1,0) and two sumLSB bits S_(1,1) and S_(1,0) from the CSA1 220, a carry-in, and a signbit of B, and outputs q₁q₀ of 2 bits, which is a value for determining amultiple of the modular reduction. The selector 240, which can beimplemented by multiplexers (MUXs), selects and outputs one of 0, N, 2N,and 3N based on a determined value of q. The AND logic 260 performs anAND operation, with two bits S_(2,1) and C_(2,0) output from the CSA2250 as its inputs, and provides a result value of the operation to thequotient logic 230 as a carry-in signal.

[0117] Although not shown in detail in FIG. 7, it should be noted thatthe modular multiplication apparatus includes temporary storingregisters C and R for storing carry values and sum values, which are theoutputs form the CSA2 250, for each clock, and a carry propagation adderfor adding values stored in the temporary storing registers C and R andoutputting a resultant value as a result of the modular multiplication.

[0118]FIG. 8 is a block diagram showing a detailed configuration of therecording logic 210 shown in FIG. 7.

[0119] Referring to FIG. 8, the recording logic 210 Booth-records thetwo lesser bits of a bit string generated by sequentially shifting bitsof the multiplier A, multiplexes a result of the Booth recording withthe multiplicand B, and outputs binary numbers of (n+3) bits. For thispurpose, a shift register 202 for sequentially shifting bits of themultiplier to generate a shifted bit string and a register 204 forstoring the multiplicand are provided at the front stage of therecording logic 210. The recording logic 210 also includes a multiplexer(MUX) 212. The multiplexer 212 multiplexes the two lesser bits a_(i+1)and a_(i) of the generated bit string with the multiplicand, and outputs0, B, 2B and 3B as a result of multiplexing. The recording logic 210,which is a circuit implementing a modified Booth recording based on themultiplier A, selects and outputs one of the values of 0, B, 2B and 3Bof (n+3) bits.

[0120]FIG. 9 is a block diagram showing a detailed configuration of theCSA1 shown in FIG. 7.

[0121] Referring to FIG. 9, the CSA1 220 having (n+4) full adders 221 to225 has as its inputs first signals S_(2,2) to S_(2,n+2) of (n+1) bits,second signals C_(2,1) to C_(2,n+2) of (n+2) bits, and third signals B₀to B_(n+2) being the binary numbers of (n+3) bits from the recordinglogic 210, and full-adds the inputs by means of (n+3) full adders 221 to225 to output carry values C_(1,0) to C_(1,n+2) and sum values S_(1,0)to S_(1,n+2) of (n+3) bits. The first and second signals are signalsprovided from the CSA2 250 and the third signals are signals providedfrom the recording logic 210. A most significant bit S_(2,n+2) of thefirst signals is input to the third-highest full adder 223 of the fulladders, and a most significant bit C_(2,n+2) of the second signals isinput to the second-highest full adder 224 of the full adders. A mostsignificant bit full adder 225 of the full adders is provided with “0”asthe first and second signals and the second-highest full adder 224 isprovided with “0”as the first signals. Namely, the first signals S_(2,2)to S_(2,n+2) of (n+1) bits are sequentially input to a least significantbit full adder 221 and to a (n+1)th full adder 223 of the CSA1 220,respectively, and “0”is input as the first signal to a (n+2)th fulladder 224 and a (n+3)th full adder 225. In addition, the second signalsC_(2,1) to C_(2,n+2) of (n+2) bits are sequentially input to the leastsignificant bit full adder 221 and to the (n+2)th full adder 224 of theCSA1 220, respectively, and “0”is input as the second signal to a(n+3)th full adder 225. In addition, the third signals B₀ to B_(n+2) of(n+3) bits are sequentially input to the least significant bit fulladder 221 and to the (n+1)th full adder 223 of the CSA1 220,respectively.

[0122]FIG. 10 is a block diagram showing a detailed configuration of thequotient logic 230 shown in FIG. 7.

[0123] Referring to FIG. 10, the quotient logic 230 has as its inputssum values S_(1,0) and S_(1,1) output from two lesser full adders and acarry value C_(1,0) output from a lesser full adder, which are selectedfrom the carry values and sum values of (n+4) bits from the CSA1 120 ,and outputs a determination value q₁q₀ of 2 bits to determine a multipleof the modular reduction. The quotient logic 230 consists of D flip flop232, a half adder (HA) 234, an exclusive OR (XOR) logic gate 236, and acombinational circuit 238. The D flip flop 232 temporarily stores acarry input value Carry-in input thereto from the AND logic 260. Thehalf adder 234 half-adds the carry input value Carry-in stored in the Dflip flop 232 and the sum value S_(1,0) output from the leastsignificant bit full adder 221 of the CSA1 220. The exclusive OR logic236 performs an exclusive Or operation the carry value C_(1,0) outputfrom the least significant bit full adder 221 of the CSA1 220 and thesum value S_(1,1) output from a second-lowest full adder 222. Thecombinational circuit 238 combines an output S₀ from the half adder 234,an output S₁ from the exclusive OR logic 236, and a preset input bit n1to output the determination value q₁q₀ of 2 bits.

[0124]FIG. 11 is a block diagram showing a detailed configuration of theCSA2 shown in FIG. 7.

[0125] Referring to FIG. 11, the CSA2 250 has (n+3) full adders 251 to256. The CSA2 250 has modulo numbers N (N₀−N_(n+2)) of (n+3) bitsselected from the selector 240 as first input signals, and remainingcarry values C_(1,0) to C_(1n+2) of (n+2) bits, except a mostsignificant bit carry value of the carry values of (n+3) bits, from theCSA1 220 as second input signals, and remaining sum values S_(1,1) toS_(1,n+2) of (n+2) bits except a least significant bit carry value ofthe sum values of (n+3) bits from the CSA1 220 as third input signals tooutput carry values C_(2,0) to C_(2,n+2) of (n+3) bits and sum valuesS_(2,0) to S_(2,n+2) of (n+3) bits by means of the (n+3) full adders 251to 256. The (n+3) bits of the first input signals are sequentiallyinput, starting from a least significant bit full adder 251, torespective full adders 251 to 256, the (n+2) bits of the second inputsignals are sequentially input, starting from a second-lowest full adder252, to respective full adders 252 to 256, and the (n+2) bits of thethird input signals are sequentially input, starting from thesecond-lowest full adder 252, to respective full adders 252 to 256. Theleast significant bit full adder 251 of the full adders 251 to 256 isinput with the output S₀ from the half adder 234 of the quotient logic230 and the carry input value Carry-in from the AND logic 260.

[0126]FIG. 12 is a block diagram showing a detailed configuration of theAND logic shown in FIG. 7.

[0127] Referring to FIG. 12, the AND logic 260 full-adds a carry valueC_(2,0) output from the least significant bit full adder 251 of the CSA2250 and a sum value S_(2,1) output from the second-lowest full adder 252to output the carry input value Carry-in. The carry input value Carry-inis provided to the quotient logic 230.

[0128] C-2. Principle of the Invention

[0129] The present invention provides a device for calculatingA·B·R⁻¹modN in m+2 clocks with A, B and N (where R=4^(m+2), m=n/2, −N≦A,and B<N), each having n bits as its inputs. Two principles that areapplicable to implementation of the present invention will now bedescribed. The two principles include a first principle ofrepresentation of the multiplier A and the multiplicand B for modularmultiplication and a second principle of the Montgomery algorithm usinga principle of recording of the present invention.

[0130] C-2.a 2 bit Scanning

[0131] In the present invention, the multiplier A is scanned (orshifted) by two bits from the LSB for each clock and is then multipliedwith the multiplicand B, and a result of the multiplication is used forthe Montgomery algorithm. Therefore, as generated in each loop, which isone of elements {0, 1, 2, 3}, is multiplied with the multiplicand B, anda result of the multiplication is input to the CSA1 220.

[0132] C-2.b. Radix-4 Montgomery Algorithm

[0133] The following algorithm illustrated in Equation 4 shows that thepresent invention employs radix-4 Montgomery modular multiplication. Anoriginal Montgomery algorithm compares a result value with a modulus N,and performs a subtraction operation if the result value is greater thanthe modulus N. However, the following algorithm of the present inventiondoes not show such a comparison and subtraction operation of theoriginal Montgomery algorithm.

[0134] Equation 4

[0135] Input: N, −N≦A,B<N

[0136] Output: S=A·B·4^(−m−2)modN, 0≦S<N

S=0  (1)

for i=0 to (n+1)/2  (2)

S=S+A _(i) ×B  (3)

q _(i(1,0)) =f(s ₁ ,s ₀ ,n ₁ ,n ₀)  (4)

S=S+q_(i) ×N  (5)

S=S/2²  (6)

endfor  (7)

[0137] In the algorithm of Equation 4, A_(i) in procedure (3) relates totwo scanned bits. Procedure (4) relates to a function to cause the twoleast significant bits of the result values in procedure (5) to be ‘0’.The result values in procedure (4) depend on input bits s₁, s₀, n₁, andn₀, and, for the Montgomery modular multiplication, is actuallydetermined as shown in the following Table 5 since N is an odd numberand n₀ is always 1. A value q_(i) used for modular reduction is one ofthe elements of {0, 1, 2, 3} and is calculated according to thefollowing Equation 5.

[0138] Equation 5

q₀=s₀

q ₁ =s ₀ {overscore (s₁n₁)}+ s ₀ s ₁ +s ₁ n ₁

[0139] TABLE 5 S₀ s₁ n₁ q₁ q₀ 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 10 0 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1

[0140] C-3. Operation of the Invention

[0141] The apparatus of the present invention as shown in FIG. 7calculates A·B·R⁻¹modN in m+2 clocks with A, B and N (where R=4^(m+2),m=n/2, −N≦A, and B<N), each having n bits, as its inputs.

[0142] A procedure for calculating A·B·R⁻¹modN (where R=4^(m+2)) by theshown in FIG. 7 will now be described. In the following description,step a) is an initialization step, steps b) to h) are steps to beperformed every clock, and step i) is a step to be performed after thesteps b) to h) are performed during (m+2) clocks.

[0143] a) A, B, and N, each consisting of n bits, input for modularmultiplication, are stored in respective registers (or memories). Inaddition, 2B and 3B of n+2 bits are stored in respective registers (ormemories). Although the apparatus of the present invention is shown tostore the inputs A and B in respective registers 202 and 204 withoutshowing separate registers in which 2B and 3B are respectively stored,it is apparent to those skilled in the art that such separate registersare used in the apparatus of the present invention. The register 202 inwhich A is stored, is a shift register in which A is shifted to theright side by two bits for each clock. The register in which A is storedis indicated as register A and the register in which B is stored isindicated as register B. In the case of the memory, A and B are read oneword at a time. Temporary registers (or memories) C and S (both notshown in detail), in which a result of calculation by the CSA2 250 shownin FIG. 7, is temporarily stored are initialized as ‘0’.

[0144] b) When all data is input to each of the registers 202 and 204,the recording logic 210 performs a Booth recording function based on thetwo LSB bits in the register A 202. The MUX 212 of the recording logic210 has as its input a value stored in the register B 204 and selectsone of the values of 0, B, 2B, 3B, which is provided as one of threeinputs of the CSA1 220, based on the two LSB bits in the register A 202.

[0145] c) The CSA1 220 performs an add operation for three input binarynumbers of n+3 bits. The CSA1 220 is composed of n+3 full adders 121 to125.

[0146] d) The quotient logic 230 has as its inputs output valuesS_(1,1), C_(1,0), and S_(1,0) of the CSA1 220 and a Carry-in signalprovided from the AND logic 260, and calculates and outputs S₁ and S₀ bymeans of the half adder 234 and the exclusive OR logic 236.

[0147] e) The combinational circuit 238 of the quotient logic 230 has asits inputs S₁ and S₀ calculated in step d) and determines a value q of 2bits by means of a truth table of Table 5. Although a detailedconfiguration of a circuit to determine the value of q by means of thetruth table of Table 5 is not shown, it is apparent to those skilled inthe art that a circuit for determining the value of q can be implementedby a general logic gate circuit.

[0148] f) The CSA2 250 has as its inputs carry values and sum valuesobtained as outputs of the CSA1 220 in step c), and a binary number ofn+3 bits of one selected from 0, N, 2N and 3N determined by the two LSBbits of values of q obtained in step e) to perform an n+3 bit non-signedoperation. The CSA2 250 is composed of n+3 full adders 251 to 256 likethe CSA1 220. It should be noted that the LSB full adder 251 of the fulladders 251 to 256 has as its carry input the Carry-in signal generatedin a previous stage.

[0149] g) The AND logic 260 has as its inputs S_(2,1) and C_(2,0) bitsof output values of the CSA2 250 to output Carry-in bits through an ANDoperation on the inputs.

[0150] h) (n+2) sum values and (n+3) carry values from MSBs of theoutputs of the CSA2 250 are fedback to the CSA2 220 as its input. Twohigher bits of the sum values and one higher bit of the carry values are“0”and two bits are shifted to the right side in the CSA2 250 for thefeedback to the CSA1 220. The sum value S_(2,n+2) output from the fulladder 256 of the CSA2 250 is provided to the third-highest full adder223 of the CSA1 220, and the sum value of “0”is provided to the MSB fulladder 225 and the second-highest full adder 224. The carry valueC_(2,n+2) output from the full adder 256 of the CSA2 250 is provided tothe second-highest full adders 224 of the CSA1 220 and the carry valueof “0”is provided to the MSB full adder 225.

[0151] i) The following operation is performed after steps b) to h) areperformed during (m+2) clocks. A carry propagation adder (CPA) (notshown) performs addition an operation for the carry value and the sumvalue, which are outputs of the CSA2 250.

[0152] For example, if each of A, B and N has 12 bits as shown in thefollowing Equation 6, a Montgomery modular operation result according tothe above-described procedure is as shown in the following Table 6 andTable 7. At this time, a final result of operation is as follows:

FinalResult:0111.1100.0111(0x7C7)+0010.1000.0000(0x280)+1=1010.0100.1000(0xA48)

[0153] Equation 6

N=000.1010.0101.1001 (0xA59) B=000.0101.1100.0011 (0x5C3)

2N=001.0100.1011.0010 (0x13B2) 2B=000.1011.1000.0110 (0xB86)

3N=001.1111.0000.1011 (0x1F0B) 3B=001.0001.0100.1001 (0x1149)

A=000.1001.0011.1110 (0x93E)

[0154] TABLE 6 CSA1 out' S I A_(i) C Carry-in S₁S₀ I 0000.0000.0000.0000 0 00 0000.0000.0000.000 0 2 000.1011.1000.0110 0 100000.0000.0000.000 1 3 001.0110.1100.0101 0 11 0000.0010.1001.001 2 3001.0111.1010.0010 1 01 0000.0010.1001.001 3 0 000.1001.0100.1111 1 000000.0101.0000.000 4 1 000.0110.0101.0000 1 11 0000.0011.0000.011 5 2000.1001.0110.1101 1 10 0000.0111.0000.010 6 0 000.0100.0010.0100 1 010000.0101.0010.010 7 0 000.0101.0001.0000 1 01 0000.0101.0000.010

[0155] TABLE 7 CSA2 out S I A_(i) S₁S₀ q₁q₀ C Carry-in I 0 00 00000.0000.0000.0000 0 0000.0000.0000.000 0 2 10 10(0.0).001.1111.0011.0100 0 (0).0000.0001.0000.010 1 3 11 01(0.0)001.1110.0000.1110 1 (0).0000.0101.1010.001 2 3 01 11(0.0).000.1010.0011.1010 1 (0).0010.1111.0000.011 3 0 00 00(0.0)000.1100.0100.1110 1 (0).0000.0010.0000.001 4 1 11 01(0.0)000.1111.0000.1110 1 (0).0000.0100.1010.001 5 2 10 10(0.0)001.1010.1101.1010 1 (0).0000.1010.0100.101 6 0 01 11(0.0)001.1110.0000.1010 1 (0).0000.1010.0100.101 7 0 01 11(0.0)001.1111.0001.1110 1 (0).0000.1010.0000.001

[0156] A procedure for calculating the modular multiplication A·BmodNusing result values of the operation by the apparatus of the presentinvention as described above will be described as follows. It should benoted that a hardware configuration for performing the procedure isapparent to those skilled in the art, and hence, a detailed explanationthereof is omitted. The following calculations are performed

[0157] 1) Calculate P=2^(2(n+4))modN;

[0158] 2) Calculate C=A·B·2^(−(n+4))modN; and

[0159] 3) Calculate P·C·2^(−(n+4))modN=A·BmodN.

[0160] Next, a procedure for calculating the modular exponentiation,m^(e)modN, required to perform the RSA operation using the result valuesof the operation of the apparatus of the present invention as describedabove will be described as follows. The following procedure occurs:

[0161] 1) Store an exponent e in a register (or a memory);

[0162] 2) Store a modulus N in the temporary register C;

[0163] 3) Initialize the temporary registers C and S to ‘0’;

[0164] 4) Perform Montgomery modular multiplication,m′=f_(m)(m,P,N)=m·P·R⁻¹modN, where, a P in the modular exponentiation isa pre-calculated value defined in the aforementioned procedure, andR=2^(n+4);

[0165] 5) Load m′ into the register B;

[0166] 6) Perform modular square operation using a value loaded into theregister B, where, the multiplier A required for the Montgomery modularmultiplication is loaded from the register B and its value is obtainedby using the radix-4 recording circuit;

[0167] 7) Shift the exponent e to the left;

[0168] 8) Ignore MSB 1 of the exponent e and perform subsequent steps 9)and 10) after next bits;

[0169] 9) Perform steps 4) and 5) for the modular square operationregardless of a bit (0 or 1) of the exponent e, the multiplier and themultiplicand, which are required for the square operation, are stored inthe register A and the register B, respectively;

[0170] 10) If the current bit of the exponent e is 1, perform steps 4)and 5) for the modular multiplication after performing step 9), at thistime, the multiplicand is the content of the register B and themultiplier is the base m′ in the exponentiation; and

[0171] 11) Perform the modular multiplication once more using step 4)after performing steps 8) to 10) for all bits of the exponent e, wherethe multiplicand is the content of the register B and the multiplier is1.

[0172] The result value of the performance of the CPA for valuesremaining in the registers C and S after performing the above steps 1)to 11) becomes a final value of the exponentiation, m^(e)modN.

[0173] C-4. Effect of the Invention

[0174] As apparent from the above description, the present inventionprovides a circuit for calculating A·B·2^(−(n+4))modN, making thegeneral modular multiplication A·BmodN possible by means of the circuit.A·BmodN calculated according to the present invention is applicable inhardware apparatuses employable in devices for generating and verifyingdigital signatures. In addition, the present invention is applicable tohardware apparatuses for defining electronic signatures, authenticationand encryption/decryption based on IC cards. In addition, the presentinvention can provide devices for encrypting and decrypting data orinformation by means of an electronic signature apparatus for performingthe modular multiplication. Furthermore, the present invention can beused to implement existing public key cryptography systems such asNIST-DSS, RSA, ElGamal, and Schnorr electronic signatures, based on theelectronic signature apparatus.

[0175] D. Example of Application of the Invention.

[0176]FIG. 13 is a block diagram of an IC card, which is capable ofperforming encryption and electronic signature by using the Montgomerytype modular multiplication apparatus disclosed in the presentapplication.

[0177] In FIG. 13, a central processing unit (CPU) 310 decodesinstructions to perform an encryption, authentication and electronicsignature, and provides control signals and data required for a modularcalculation to coprocessor 330. A read only memory (ROM) 350 contains asecurity module for securing data, for example, a key required forencryption and electronic signature. Control logic 320 and random accessmemory (RAM) 340 are also shown, and provide their logic and memory toperform the above operations.

[0178] Although the preferred embodiments of the present invention havebeen disclosed for illustrative purposes, those skilled in the art willappreciate that various modifications, additions and substitutions arepossible, without departing from the scope and spirit of the inventionas disclosed in the accompanying claims.

What is claimed is:
 1. A modular multiplication device for implementingan information encryption/decryption technique in which message (A) isencrypted/decrypted using a first key (B) and a second key (N),comprising: a storage device for storing the message, the first key, andthe second key, each being n bits in length; a recording logic forgenerating at each clock a first n+4 bit signal using the message andthe first key; a first carry save adder for generating a 3 bit sequenceconsisting of one carry value and two sum values using the first n+4 bitsignal and two parallel n+4 bit input signals; a quotient logic forgenerating a 3 bit determiner for determining a modular reductionmultiple using the 3 bit sequence and one carry value; a selector forgenerating a second n+4 bit signal using the second key and the 3 bitdeterminer; a second carry save adder for generating a pair of sumvalues and a pair of carry values using the second n+4-bit signal, andrespective sum and carry terms outputted from the first carry adder; anda first full adder for generating a carry input value by performing afull addition operation with the pair of sum and carry values and acarry value (cin) outputted from the quotient logic at a previous clock.2. The device of claim 1, wherein the storage device contains shiftregisters for storing the respective message, first key, and second key.3. The device of claim 1, wherein the message is right-shifted by 2 bitpositions at every clock.
 4. The device of claim 1, wherein the firstn+4 bit signal is one of 0, B, 2B, −B, and −2B.
 5. The device of claim1, wherein the second n+4 bit signal is one of 0, N, 2N, −N, and −2N. 6.The device of claim 1, wherein the recording logic comprises: a boothrecording circuit for performing booth recoding with the two leastsignificant bits of the message; a multiplexer for multiplexing the twoleast significant bits and the first key, and outputting one of 0, B,and 2B; and a one's complementary operator for performing a one'scomplementary operation on the n+1 bit signal outputted from themultiplexer according to the two least significant bits and generatingone of 0, B, 2B, −B, and −2B.
 7. The device of claim 1, wherein thefirst carry save adder comprises n+4 second full adders, each performinga full addition operation with corresponding sum and carry bits of thetwo parallel n+4 input signals and corresponding bit of the first n+4bit signal, and producing the 3 bit sequences.
 8. The device of claim 7,wherein a first one of the two parallel n+4 bit input signals is createdby selecting the most significant n+2 bits from a sum term of the secondcarry save adder and inserting 2 bits as the most significant bits ofthe selected n+2 bits.
 9. The device of claim 8, wherein the two mostsignificant bits are zeros.
 10. The device of claim 8, wherein a secondone of the two parallel n+4 bit input signals is created by selectingthe most significant n+3 bits from a carry term of the second carry saveadder and inserting one bit as the most significant bit of the selectedn+3 bits.
 11. The device of claim 10, wherein the one most significantbit is zero.
 12. The device of claim 1, wherein the quotient logiccomprises: a D flip-flop for temporally storing the carry input valuefrom the first full adder; a third full adder for performing a fulladdition operation on the carry input value, a sum value outputted froma least significant full adder of the first carry save adder, and a signbit of the first n+4 bit signal; an exclusive OR (XOR) logic gate forperforming an exclusive OR operation on the carry value outputted fromthe least significant full adder of the first carry save adder, a sumvalue outputted from a secondly least significant full adder of thefirst carry save adder, and the carry value of the third full adder; anda combinational circuit for combining the outputs of the third fulladder, exclusive OR logic gate and a second least significant bit of thesecond key, and outputting the 3 bit determiner signal.
 13. The deviceof claim 1, wherein the second carry save adder comprises n+4 fourthfull adders, each performing a full addition operation withcorresponding sum and carry bits from the first carry save adder exceptfor a least significant sum bit and a most significant carry bit and acorresponding bit of the second n+4 bit signal, and producing the pairsof sum and carry values.
 14. The device of claim 1, wherein the firstfull adder performs a full addition with a sum value output from asecond least significant full adder of the second carry save adder, acarry value outputted from a least significant full adder of the secondcarry save adder, and a carry value (cin) outputted from the quotientlogic at a previous clock, and produces the carry input value.
 15. Thedevice of claim 1 further comprising a carry propagation adder forperforming a carry propagation addition operation with the sum and carryterms outputted from the second carry save adder after m+2 clock, wherem=n/2.
 16. The device of claim 15, wherein the carry propagation adderadds modulus second key to a result of the carry propagation additionoperation if an output of the carry propagation adder is negative value.17. A modular multiplication device for implementing a messageencryption/decryption technique in which the message (A) isencrypted/decrypted using a first key (B) and a second key (N),comprising: a storage device for storing the message, the first key, andthe second key, each of n bits in length; a recording logic forgenerating at each clock a first n+3 bit signal using the message andfirst key; a first carry save adder for outputting a 3 bit sequenceconsisting of one carry value and two sum values by performing a firstcarry save addition operation with the first n+3 bit signal and twoparallel n+3 bit input signals; a quotient logic for generating a 2 bitdeterminer for determining a modular reduction multiple by performing aquotient operation with the 3 bit sequence and the one carry value; aselector for generating a second n+3 bit signal using the second key andthe 2-bit determiner; a second carry save adder for outputting a pair ofsum values and a pair of carry values by performing a second carry saveaddition operation with the second n+3-bit signal, and respective sumand carry terms outputted from the first carry addition operation; anAND logic gate for outputting a carry input value by performing an ANDoperation with the pair of sum and carry values.
 18. The device of claim17, wherein the storage device contains shift registers for storing therespective message, first key, and second key.
 19. The device of claim18, wherein the message is shifted by 2 positions at every clock. 20.The device of claim 17, wherein the first n+3 bit signal is one of 0, B,2B, and 3B.
 21. The device of claim 17, wherein the second n+3 bitsignal is one of 0, N, 2N, and 3N.
 22. The device of claim 17, whereinthe recording logic is a multiplexer which multiplexes two leastsignificant bits of the message and n bits of first key, and outputs thefirst n+3 bit signal.
 23. The device of claim 12, wherein the firstcarry save adder comprises n+3 first full adders, each performing a fulladdition operation with corresponding sum and carry bits of the twoparallel n+3 bit input signals and a corresponding bit of the first n+3bit signal, and outputting the 3 bit sequence.
 24. The device of claim23, wherein a first one of the two parallel n+3 bit input signals iscreated by selecting the most significant n+1 bits from a sum term ofthe second carry save adder and inserting two bits as the mostsignificant bits of the selected n+1 bits.
 25. The device of claim 24,wherein the two most significant bits are zeros.
 26. The device of claim24, wherein a second one of the two parallel n+3 bit input signals iscreated by selecting the most significant n+2 bits from a carry term ofthe second carry save adder and inserting 1 bit as the most significantbit of the selected n+2 bits.
 27. The device of claim 26, wherein theone most significant bit is zero.
 28. The device of claim 17, whereinthe quotient logic comprises: a D flip-flop for temporally storing thecarry input value from the AND logic gate; a half adder for performing ahalf addition operation with the carry input value and a sum valueoutput from a most significant fuller adder of the first carry saveadder; an exclusive OR (XOR) logic gate for performing an exclusive ORoperation with a carry value output from the least significant fulladder of the first carry save adder, a sum value output from a secondleast significant full adder, and an output of the half adder; acombinational circuit for combining the outputs of the half adder andthe exclusive OR logic gate and a second least significant bit (n1) ofthe second key and outputting the 2 bit determiner signal.
 29. Thedevice of claim 17, wherein the second carry save adder comprises: n+3second full adders, each performing a full addition operation withcorresponding sum and carry bits from the first carry saver adder,except for a least significant sum bit and the most significant carrybit, and a corresponding bit of the second n+3 bit signal, and producingthe pairs of sum and carry bits.
 30. The device of claim 17, wherein theAND logic gate performs an AND operation with a sum value output from asecond least significant second full adder of the second carry saveadder and a carry value output from a least significant second fulladder of the second carry save adder, and produces the carry inputvalue.
 31. The device of claim 17 further comprising a carry propagationadder for performing carry propagation addition operation with the sumand carry terms output from the second carry save adder after m+2 clock,where m=n/2.
 32. A modular multiplication method for implementing amessage encryption/decryption technique in which a message (A) isencrypted/decrypted using a first key (B) and a second key (N),comprising: storing the message, first key, and second key each of nbits; generating a first n+4 bit signal using the message and first keyat each clock; outputting a 3 bit sequence consisting of one carry valueand two sum values by performing a first carry save addition operationwith the first n+4 bit signal and two parallel n+4-bit input signals;generating a 3-bit determiner for determining a modular reductionmultiple by performing a quotient operation with the 3 bit sequence andone input carry value; generating a second n+4 bit signal using thesecond key and the 3 bit determiner; outputting a pair of sum values anda pair of carry values by performing a second carry save additionoperation with the second n+4 bit signal, and respective sum and carryterms output from the first carry save addition operation; outputting acarry input value by performing a full addition operation with the pairof sum and carry values and a carry value outputted from the quotientlogic at a previous clock.
 33. The method of claim 32, wherein themessage is right-shifted by 2 bits at every clock.
 34. The method ofclaim 32, wherein generating the first n+4 bit signal includes:performing booth recording with the two least significant bits of themessage; and generating, one of 0, B, 2B, −B, and −2B according to thetwo least significant bits.
 35. The method of claim 32, wherein a firstone of the two parallel n+4-bit input signals is created by selectingthe most significant n+2 bits from a sum term outputted by the secondcarry save addition operation and inserting 2 bits as the mostsignificant bits of the selected n+2 bits.
 36. The method of claim 32,wherein the two most significant bits are zeros.
 37. The method of claim32, wherein a second one of the two parallel n+4 input signals iscreated by selecting the most significant n+3 bits from a carry term ofthe second carry save addition operation and inserting one bit as themost significant bit of the selected n+3 bits.
 38. The method of claim32, wherein the one most significant bit is zero.
 39. The method ofclaim 32, wherein the 3 bit sequence includes two sum values and onecarry value.
 40. The method of claim 39, wherein the two sum values area least significant bit and a second least significant bit of a sum termoutput from the first carry save addition operation.
 41. The method ofclaim 39, wherein the one carry value is a least significant bit of acarry term output from the first carry save addition operation.
 42. Themethod of claim 32, wherein the one input carry value is the carry inputvalue generated by the full addition operation.
 43. The method of claim32, wherein the second n+4 bit signal is selected from among 0, N, −N,and −2N according to the two least significant bits of the 3 bitdeterminer.
 44. The method of claim 32, wherein the pair of sum andcarry values are a second least significant bit of a sum term and aleast significant bit of the carry term outputted from the second carrysave addition operation.
 45. The method of claim 32, wherein the mostsignificant bits of the sum and carry terms outputted from the firstcarry save addition operation are ignored.
 46. The method of claim 32,further comprising performing a carry propagation addition operationwith the sum and carry terms after m+2 clock, where m=n/2.
 47. Themethod of claim 46, further comprising adding a modulus second key if anoutput of the carry propagation addition operation is a negative value.48. A modular multiplication method for implementing a messageencryption/decryption technique in which message (A) isencrypted/decrypted using a first key (B) and a second key (N),comprising: storing the message, first key, and second key, each of nbits in length; generating a first n+3 bit signal using the message andfirst key at each clock; outputting a 3 bit sequence consisting of onecarry value and two sum values by performing a first carry save additionoperation with the first n+3 bit signal and two parallel n+3 bit inputsignals; generating a 2 bit determiner for determining a modularreduction multiple by performing a quotient operation with the 3 bitsequence and one input carry value; generating a second n+3 bit signalusing the second key and the 2 bit determiner; outputting a pair of sumvalues and a pair of carry values by performing a second carry saveaddition operation with the second n+3 bit signal, and respective sumand carry terms outputted from the first carry addition operation;outputting a carry input value by performing an AND operation with thepair of sum and carry values.
 49. The method of claim 48, wherein themessage is right-shifted by 2 bits at every clock.
 50. The method ofclaim 48, wherein the first n+3 bit signal is produced by multiplexingthe two least significant bits of the message and the first key.
 51. Themethod of claim 48, wherein the first n+3 bit signal is one of 0, B, 2B,and 3B.
 52. The method of claim 48, wherein a first one of the twoparallel n+3 bit input signals is created by selecting the mostsignificant n+1 bits from a sum term of the second carry save additionoperation and inserting two bits as the most significant bits of theselected n+1 bits.
 53. The method of claim 52, wherein the two mostsignificant bits are zeros.
 54. The method of claim 52, wherein a secondone of the two parallel n+3 bit input signals is created by selectingthe most significant n+2 bits from a carry term of the second carry saveaddition operation and inserting 1 bit as the most significant bit ofthe selected n+2 bits.
 55. The method of claim 54, wherein the one mostsignificant bit is zero.
 56. The method of claim 48, wherein the 3 bitsequence includes two sum values and one carry value.
 57. The method ofclaim 56, wherein the two sum values are a least significant bit and asecond least significant bit of a sum term output from the first carrysave addition operation.
 58. The method of claim 56, wherein the onecarry value is a least significant bit of a carry term output from thefirst carry save addition operation.
 59. The method of claim 48, whereinthe one input carry value is the carry input value generated by the ANDoperation.
 60. The method of claim 48, wherein the second n+3 bit signalis selected from among 0, N, 2N, and 3N according to 2-bit determiner.61. The method of claim 48, wherein the pair of sum and carry values area second least significant bit of a sum term and a least significant bitof the carry term output from the second carry save addition operation.62. The method of claim 48, wherein the most significant bits of the sumand carry terms output from the first carry save addition operation areignored.
 63. The method of claim 48, further comprising performing acarry propagation addition operation with sum and carry terms outputfrom the second carry save addition operation after m+2 clock, wherem=n/2.